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Abstract. Approximate Message Passing (AMP) has been shown to be an excellent 
statistical approach to signal inference and compressed sensing problem. The AMP 
framework provides modularity in the choice of signal prior; here we propose a 
hierarchical form of the Gauss-Bernouilli prior which utilizes a Restricted Boltzmann 
Machine (RBM) trained on the signal support to push reconstruction performance 
beyond that of simple i.i.d. priors for signals whose support can be well represented by 
a trained binary RBM. We present and analyze two methods of RBM factorization 
and demonstrate how these affect signal reconstruction performance within our 
proposed algorithm. Finally, using the MNIST handwritten digit dataset, we 
show experimentally that using an RBM allows AMP to approach oracle-support 
performance. 
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Over the past decade, a groundswell in research has occurred in both difficult inverse 
problems, such as those encountered in Compressed Sensing (CS) |1], and in signal 
representation and classification via deep networks. In recent years. Approximate 
Message Passing (AMP) [2] has been shown to be a near-optimal, efficient, and extensible 
application of belief propagation to solving inverse problems which admit a statistical 
description. 

While AMP has enjoyed much success in solving problems for which an i.i.d. signal 
prior is known, only a few works have investigated the application of AMP to more 
complex, structured priors. Utilizing such complex priors is key to leveraging many of 
the advancements recently seen in statistical signal representation. Techniques such as 
GrAMPA [3] and Hybrid AMP jl] have shown promising results when incorporating 
correlation models directly between the signal coefficients, and in fact the present 
contribution is similar in spirit to Hydrid AMP. 

Another possible approach is to not attempt to model the correlations directly, 
but instead to utilize a bipartite construction via hidden variables, as in the Restricted 
Boltzmann Machine (RBM) |5l [6]. The RBM is an example of latent variable model, 
which we distinguish from the fully visible models considered by Pi- Such latent 
variable models can become quite powerful, as they admit interpretations such as feature 
extractors and multi-resolution representations. As we will show, if a binary RBM can 
be trained to model the support patterns of a given signal class, then the statistical 
description of the RBM easily admits its use within AMP. This is particularly interesting 
since RBMs are the building blocks of deep belief networks |7j and have enjoyed a surge of 
renewed interest over the past decade, partly due to the development of efficient training 
afgorithms (e.g. Contrastive Divergence (CD) [ 0 ]). The present paper demonstrates the 
first steps in incorporating deep learned priors into generalized linear problems. 

2. Approximate message passing for compressed sensing 

Loopy Belief Propagation (BP) is a powerful iterative message passing algorithm for 
graphical models [Eli. However, it presents two main drawbacks when applied to 
highly-connected, continuous-variable problems, as in CS: first, the need to work with 
continuous probability distributions; and second, the necessity to iterate over one such 
probability distribution for each pair of variables. These problems can be addressed 
by projecting the distributions onto their first two moments and by approximating the 
messages, which exist on the edges of the factor graph representation of the inference 
problem, by the marginals existing on the nodes of the factor graph. Applying both to 
BP for the CS problem, one obtains the AMP iteration. 

AMP has been shown to be a very powerful algorithm for CS signal recovery, 
especially in the Bayesian setting where an a priori model of the unknown signal is 
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given. In CS, one has the following forward model to obtain set of M observations y, 

N 

y^ = '^ FniXi + Wrj where ^ M (0, A), (1) 

i=l 

where F is an M x A matrix, for M N, representing linear observations of an 
unknown signal x which are then corrupted by an additive zero-mean white Gaussian 
noise (AWGN) of variance A. The measurement rate a = M/N is of particular interest 
in determining the difficulty of this inverse problem. In the present work, we use 
subscript notation to denote the individual coefficients of vectors, i.e. i/rj refers to the 
coefficient of y and the double-subscript notation to refer to individual matrix elements 
in row-column order. 

We can use AMP to estimate a factorization, up to the first two moments, of the 
posterior distribution P(x|F,y) oc Po(x)P(y|F,x), where P(y|F,x) is the likelihood 
of an observation from the AWGN channel and Fo(x) is a prior on the signal. If the 
posterior is trivially factorized, or if it can be approximated by a factorized distribution, 
one can estimate the unknown signal by averaging over the posterior, 

= ydxi XiP{xi\F,y), \fi, (2) 

which is the minimum mean squared error (MMSE) estimate of x. This is in contrast 
to utilizing a maximum a posteriori (MAP) approach to the solution of this inverse 
problem. We refer the reader to n cni cn [n US!, and in particular to for the 
present notation and the derivation of AMP from BP and now give directly the iterative 
form of the algorithm. Given an estimate of the factorized posterior mean a* and variance 
Cj for each element of x, a single step of AMP iteration reads 
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( 6 ) 


These terms can be interpreted as follows. The variables and {G,} represent the 
first and second moments, respectively, of the marginalized messages on the M factors of 
the factor graph. The variables {Fj} and {Ef} represent the first and second moments, 
respectively, of the marginalized messages on the N signal variables of the factor graph. 
As such, these represent the AMP field on the signal, which are the observational beliefs 
about the continuous factorized PDF at each signal variable. To find the final factorized 
posterior of the signal, up to the two moment approximation, we must augment this 
AMP field by the signal prior. Once the new values of Ri and are computed, the 
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new estimates of the posterior mean and variance, a* and q, given the prior Po{x), are 
calculated as 


Jdxi 

4^^ = J dxi ^Po{xi)M{xi;Ri,J:‘^i) - (o-+^)^ 


( 7 ) 

( 8 ) 


where Zi = J dxi Po{xi)J\f{xi; Ri,'E‘^) is a normalization constant, commonly referred 
to as a partition function in statistical physics. The final form of these equations for 
differing values of Po{xi) are given in na. As seen in Eqs. 0 and 0 , in order to use 
the AMP framework, one must know some information about the class of signals from 
which X is drawn. Commonly in CS, we are interested in the case where the signal x is 
sparse, that is, very few of its coefficients are non-zero. The concept of sparsity implies 
an assumption that the amount of information required to represent x is actually far 
less than its dimensionality would admit. Essentially, a sparse prior on x is extremely 
informative due to its low-entropic nature. 

Much of the CS literature focuses on convex approaches to this inverse problem. 
And so, a convex ii norm is used as a regularizer to bias solutions to the inverse problem 
towards sparsity. Within the probabilistic framework, this corresponds to a selection of 
a Laplace distribution for PQ{xi). 

However, AMP is not restricted to convex priors, but can utilize non-convex priors 
of arbitrary complexity. For example, one can use a two-mode Gauss-Bernoulli (GB) 
prior to model sparse signals (as was considered in detail in [m [Ellin!) such as 


Po(x) = PJPo(a:i) = n X] Poivi)Po{xi\vi), 

i i PiG{0,l} 


( 9 ) 


where we have introduced Bernoulli random variables Vi such that E [uj] = p, on which 
the values {xi} are conditioned as Po^xfvi = 1) = N'{p,a^), and Po(^ibi = 0) = S{xi), 
where fi(-) is the zero-centered Dirac delta distribution. This leads to the final expression 
for the prior, 

Po{^;p) = Yl(^il-p)5{xi) + -j==e-^-^^ . ( 10 ) 

The GB prior has two possible modes: a zero mode and a non-zero one. If we denote the 
set of non-zero coefficients of x, those for which n, = 1, to be the support, then the GB 
prior models both the on- and off-support probability. For p G [0,1], the distribution 
splits the probabilistic weight between a hard (deterministic) constraint of x* = 0 and 
a normal distribution for arbitrary parameters that model the on-support coefficients. 
For a fixed on-support distribution, here for fixed p and cr^, the value of p controls 
how informative Po(x; p) is. The more informative this prior, the fewer measurements, 
smaller a, are required in order to successfully infer x from y. Of course, if x is not truly 
drawn from Pq ( x ; p), more measurements will be required to account for the mismatch 
between the true signal and the assumed signal model, as is the case with signals which 
are not truly sparse but merely compressible. 
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If, rather than a fixed probability of being on-support for all sites, we instead have 
a local probability for each specific site to be on-support, we can write an independent, 
but non-identically distributed, prior. For conciseness, we refer to this property as 
“non-i.i.d.” in the remainder of this paper. We easily generalize Eq. (10) to 

pi 


Po(x; {pi}) = n (+ 
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( 11 ) 


This change in the prior must also be reflected in the computation of the means 
and variances used in in Eqs. 0 and ([^. In fact, the partition function becomes 


Zi = {1- Pi) 


+ Pi~ 


1 


1 


exp 


2E? 


\/27r (Sf + ct 2 ) 

= (1 - p,)Z^ + p,zr 


exp 


(ft - 

21X1 + a‘) 


( 12 ) 


where we can write the partition using two sub-partition terms related to the off-support 
{Zf) and on-support {Zf^) probabilities. From this partition function, for a given setting 
of Ri and S? (which result from the AMP evolution), we can write the posterior means 
and variances, a and c according to the fixed GB prior parameters {pi}, p, and 

A nice feature of this two-mode prior is that it also admits a natural estimation of 
the probability of a particular coefficient to be on- or off-support. Specifically, at a given 
point in the AMP evolution, we have Prob^^^ [uj = 1 ] = and Prob^^^ [n* = 0] = 
leading to 
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The additional information provided by AMP results in a modified support probability 
P(v) oc P(v) Explicating P(v) allows us to envision more complex support 

models for the coefficients of x. The previous model assumes the independence between 
the coefficients of x, however, the existence of dependencies, now well-acknowledged 
for many natural signals as structured sparsity, can be leveraged through joint models. 
In the variational Bayesian context, we cite [H [TSl [16], which consider neighborhood 
probabilities, Markov chains, and so-called Boltzmann machines, respectively, as generic 
support models. In similar vein, we propose the use of a binary RBM as a joint support 
model. In contrast with the models of 0115], the RBM model can provide a more 
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accurate modeling of support correlations. Also, the RBM model can be trained at low 
cost jE], which is the main bottleneck of the general Boltzmann machine model used in 

US], 

3. Binary restricted Boltzmann machines 

An RBM is an energy based model defined over both visible and a set of latent, or 
hidden, variables. From the perspective of statistical physics, the RBM can be viewed 
as a boolean Ising model existing on a bipartite graph. The joint probability distribution 
over the visible and hidden layers for the RBM is given by 

P(v, h) oc , (15) 

where the energy £'(v,h) reads 

E(v, h) = - ^ blv, - b% - Y . (16) 

^ 3 hj 

The RBM model is described by the two sets of biasing coefficients b'" and on the 
visible and hidden layers, respectively, and the learned connections between the layers 
represented by the matrix W. 

In the sequel, we leverage the connection between the RBM and the well known 
results of statistical physics to discuss a simplification of the RBM under the so-called 
mean-field approximation in both the Reorder and 2"‘^-order approximation, known 
as Thouless-Anderson-Palmer (TAP) approximation [T71 ITHl [19], in order to obtain 
factorizations over the visible and hidden layers from this joint distribution. 

3.1. Mean-field approximation of the RBM 

Given the value of an Energy-function E'dx}), also called Hamiltonian in physics, a 
standard technique is to use the Gibbs variational approach where the Gibbs free energy 
T is minimized over a trial distribution, Pvar, with 

- Scbta (Pv„) (17) 

where ( • denotes the average over distribution Pyar and Acibbs is the Gibbs 

entropy. 

It is instructive to first review the simplest variational solution, namely, the Re¬ 
order naive mean-field (NMF) approximation, where Pyar = Y\iQi{.^i)- Within this 
ansatz, a classical computation shows that the free energy, in the case of the binary 
RBM, reads as 

•i^NMF (v, h) = - '^IP.Vi-^ h)hj - WijVihj 

* i (*,i> 

+ 53[H(®.) + H(l-fij)] 

i 

+ E[“G)+H(1-A,)], 

3 


( 18 ) 
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where H{x) = x ln(x) and h* = E [vj] = Prob [vi = 1]. Since the hidden variables are 
also binary, this identity is equally true for hj. 

The hxed points of the means of both the visible and hidden units are of particular 
interest. With these hxed points we can calculate a factorization for both P(v) and 
P(h). First, we look at the derivatives of the NMF free energy for the RBM w.r.t. 
the visible and hidden sites, which, when evaluated at the critical point, gives us the 
hxed-point conditions for the expected values of the variables 


Vi = sigm(6J + Wij/jj), 

j 

(19) 

hj = sigm(6j + '^WijVi), 

(20) 


i 


where sigm (x) = [1 + These equations are in line with the assumed NMF fixed- 

point conditions used for finding the “site activations” given in the RBM literature. 
In fact, they are often used as a fixed-point iteration (FPI) to find the minimum free 
energy. 

We will now modify the NMF solution via a 2^'^-order correction, as was originally 
shown for RBMs in 123 using TAP. The TAP approach is a classical tool in statistical 
physics and spin glass theory which improves on the NMF approximation by taking into 
account further correlations. In many situations, the improvement is drastic, making 
the TAP approach very popular for statistical inference (TH]. There are many ways 
in which the TAP equations can be presented. We shall refer, for the sake of this 
presentation, to the known results of the statistical physics community. One approach 
to derive the TAP approximation is to recognize that the NMF free energy is merely 
the hrst term in a perturbative expansion in power of the coupling constants W, as was 
shown by Plefka |2I] , and to keep the 2"‘^-order term. Alternatively, one may start with 
the Bethe approximation and use the fact that the system is densely connected [mill]. 
Proceeding according to Plefka, the TAP free energy reads 

(v, h) = (v, h) - i J]] WljVihj, (21) 

{i,j) 


where we have denoted the variances of hidden and visible variables, h and v, 
respectively, as hj = E [/j|] — E = hj — Jij and A = E [n^] — E — Vi — vf. 
Repeating the extremisation, one now finds 


Vi = sigm bj + 


hj = sigm Uj + E E 


( 22 ) 


(23) 


Eqs. (22) and (23), often called the TAP equations, can be seen are an extension 
of the mean field iteration of Eqs. (19) and (20). The additional term is called the 
Onsager retro-action term in statistical physics m- In fact, these are the tools one uses 
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in order to derive AMP itself. Given an RBM model, we now have two approximated 


solutions to obtain the equilibrium marginal through the iteration of either Eqs. (19) 


and (20) or Eqs. ( 22 ) and (23). 


Because of the sigmoid functions in the hxed-point conditions, iterating on these 
fixed points will not wildly diverge. However, it is possible that such an FPI will arrive 
at one of the two trivial solutions for the factorization, either the ground-state of the 
field-less RBM or i(sign(b'') -|-1). Whether or not the FPI arrives at these trivial points 
relies on the balance between the evolution of the FPI and the contributing fields. It 
may also enter an oscillatory state, especially if the learned RBM couplings are too 
large in magnitude, reducing the accuracy of the Plefka expansion which assumes small 
magnitude couplings. We shall see, however, that this FPI works extremely well in 
practice. 


4. RBMs for AMP 

The scope of this work is to use the RBM model within the AMP framework and to 
perform inference according to the graphical model depicted in Fig. Here, we would 
like to utilize the binary RBM to give us information on each site’s likelihood of being 
on-support, that is, its probability to be a non-zero coefficient. This shoehorns nicely 


into our sparse GB prior as in Eq. (10). 


Since, in the case of an RBM, P{vi) has the classical exponential form of an energy- 
based model, we see from Eq. (13) that the information provided by AMP simply 


amounts to an additional local bias on the visible variables equal to In 7 *. That is, the 
AMP-modified RBM free energy is simply the introduction of an additional field term 
along with a constant bias, 

j-RBM-AMP ^ ^ jrRBM ^ h) - + C . (24) 

i 

As we can see, this field effect only exists on the visible layer of the RBM. Because of 
this, the AMP framework does not put any extra influence on the hidden layer, but only 
on the visible layer. Thus, the fixed point of the hidden layer means is not influenced by 


AMP, but the visible ones are. With respect to (19) and ( 22 ), the AMP-modified fixed 


points of the the visible variable means contain one additional additive term within the 
sigmoid, giving 


Vi = 


sigm -F ln 7 i + j , 


for the NMF-based fixed point. For the TAP we have 
Vi = sigm 


1^6- + In 7i + ^ W,jh, + (i - fij) j 


(25) 


(26) 


J ^ ' J 

The most direct approach for factorizing the AMP infinenced RBM is to construct 
a fixed-point iteration (FPI) using the NMF or TAP fixed-point conditions. We give 
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Prior Side Observation Side 



h W V X F y 


Figure 1. Graphical representation of the statistical dependencies of the proposed 
RBM-AMP: The right side represents observational side, with linear constraints, while 
the left side represent the RBM prior on the support of the signal (see text). 


the final construction of the AMP algorithm with the RBM support prior in Alg. 
[T] using this approach. This approach can be understood in terms of its graphical 
representation given in Fig. which shows the network of statistical dependencies from 
the observations to the hidden RBM hidden units. Given some initial condition for v 
and h, we can successively estimate the visible and hidden layers via their respective 
fixed-point equations. Empirically, we have found that the hidden-layer variables should 
be initialized to zero, while the visible side variables can be initialized to zero, uniformly 
randomly, or by drawing a random initial condition according to the distribution implied 
by the RBM visible bias b''. 

For the FPI schedule, one might intuitively attempt an approach similar to 
persistent contrastive divergence [22] and allow the values of v and h persist throughout 
the AMP FPI, taking only a single FPI step on the RBM magnetizations at each 
AMP iteration. Indeed, we have found this to be the most computationally efficient 
integration of the RBM into the AMP framework. However, we point out that this 
persistent strategy should only begin after a few AMP iterations have been completed. 
In the first iterations, the RBM factorization should instead be allowed to converge 
on a value of v. Persistence of the hidden and visible magnetizations from the first 
iteration leads to poor reconstruction performance, especially when a is small. Because 
the early values of {Iny*} can be quite weak, using only a single update step on the 
RBM magnetizations early in the reconstruction does not adequately enforce the support 
prior. Later, as the AMP fields grow in magnitude, a single-step persistent update of 
the RBM magnetizations can be used to decrease run time. 

Finally, after obtaining the value of v, either by running the RBM FPI until 
convergence or by taking a single step on v, we must infer the correct values {pi}- 
One might at first attempt to use the setting {pi = fij}, however, this is an improper 
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approach for the hierarchical support prior we have proposed. Instead, one should use 

^nzi^z 

n- = - - - - - 

P'1 _ 2^z _ ^nz 

^nz^^z + (1 ~ Vi) ^nz^^z 

z z z z 

to obtain the correct per-pixel sparsity terms to use in conjunction with the standard 
Gauss-Bernoulli form of 0 and ([^. 


sigm (Inui - ln(l - Vi) - Iny^), (27) 


Algorithm 1 AMP with RBM Support Prior 


Input: F, y, W, b'', b^, PersistentStart 
Initialize: a,c,v, {hj = 0, Vj}, {hj = 0, Vj}, Iter = 1 

repeat 

AMP Update on via 

AMP Update on via (|^ 

Calculate {Iny*} via ( fid) ) Mi 
if Iter < PersistentStart then 
(Re)Initialize: {hj,hj = 0, Mj} 

(Re)Initialize: {h* = 0, Vz} 
repeat 



Update {hj,hj} via (20) or (23) 
Update {hi, A} via (25) or (26) 
until Convergence on v 
else 

Update {hj,hj} via (20) or (23) 
Update {vi,Vi} via (25) or (26) 

end if 

Calculate {p*} via 

AMP Update on (aj) using {pi} via ([^ 
AMP Update on {q} using {pi} via 
Iter Iter + 1 
until Convergence on a 


5. Numerical results with MNIST 

To show the efficacy of the proposed RBM-based support prior within the AMP 
framework, we present a series of experiments nsing the MNIST database of handwritten 
digits. Each sample of the database is a 28 x 28 pixel image of a digit with values in the 
range [0,1]. In order to build a binary RBM model of the support of these handwritten 
digits, the training data was thresholded so that all non-zero pixels were given a value 
of 1. We then train an RBM with 500 hidden units on 60,000 training samples from 
the binarized MNIST training set using the sampling-based contrastive divergence (CD- 
1) technique [0] for 100 epochs, averaging the RBM model parameter gradients across 
100-sample minibatches, at a learning rate of 0.005 under the prescriptions of [23] • We 
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Figure 2. Comparison of MNIST reconstruction performance over 300 test set 
digits over the measurement rate a. For both charts, lower curves represent greater 
reconstruction performance. (Left) Percentage of test set digit images successfully 
(MSE < 10“^) recovered. Note that the AMP-RBM model comes very close to the 
oracle reconstruction performance bound for this test set. (Right) Correlation, in 
terms of MCC, between the recovered support and the true digit image support. Here, 
the solid lines represent the average correlation at each tested a, while the solid regions 
represent the range of one standard deviation of the correlations over the test set. 


additionally impose an £2 weight-decay penalty on the magnitude of the elements of W 
at a strength of 0.001. Such penlaties, while also desirable for learning performance [23], 
are necessary for the TAP-based AMP-RBM due to the fact that the Plefka expansion of 
Eq. (21) is reliant on the magnitudes of W being small. For further reading on training 
RBMs, as well as other undirected graphical models, we refer the reader to m- Once 
the generative RBM support model is obtained, we construct the CS experiments as 
follows. For a given measurement rate a, we draw the i.i.d. entries of F from a normal 
distribution of variance l/\fN. The linear projections Fx are subsequently corrupted 
with an AWGN of variance A = 10“® to form y. In all experiments we utilize the first 
300 digit images from the MNIST test set. 

We compare the following approaches in Fig. First, we show the reconstruction 
performance of AMP using an i.i.d. GB prior (AMP-GB), assuming that the true 
image-wide empirical p is given as a parameter for each specific test image. Next, we 
demonstrate a simple modification to this procedure: the GB prior is assumed to be non- 
i.i.d. and the values of {pi\ are empirically estimated from the training samples as the 
probability of each pixel to be non-zero. We expect that our proposed approach should at 
least perform as well as non-i.i.d. AMP-GB, as this same information should be encoded 
in b'" for a properly trained RBM model. Hence, this approach should correspond to 
an RBM with zero couplings. We also show the performance of the proposed approach: 
AMP used in conjunction with the RBM support model. We present results for both 
the NMF and TAP factorizations of the RBM. For the AMP-RBM approaches, we start 
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Figure 3. Visual comparison of reconstructions for four test digits across a for the 
same experimental settings. The rows of each box, from top to bottom, correspond 
to the reconstructions providied by i.i.d. AMP-GB, non-i.i.d. AMP-GB, the proposed 
approach with NMF RBM factorization, and the proposed approach with TAP RBM 
factorization, respectively. The columns of each box, from left to right, represent 
the values a = 0.025,0.074,0.123,0.172,0.222,0.271,0.320,0.369,0.418,0.467. The 
advantages provided by the proposed approach are clearly seen by comparing the 
last row to the first one. The digits shown have p = 0.342 (top left), p = 0.268 
(bottom left), p = 0.214 (top right), and p = 0.162 (bottom right). The vertical blue 
line represents the a = p oracle exact-reconstruction boundary for each reconstruction 
task. 


persistence after AMP 50 iterations. For all tested approaches, we assume that A is 
known to the reconstruction algorithms, as well as the prior parameters fi and This 
is not a strict requirement, however, as channel and prior parameters can be estimated 
within the AMP iteration if so desired [H]. 

In the left panel of Fig. we present the percentage of successfully recovered test set 
digit images of the 300 digit images tested. A successful reconstruction is denoted as one 
which achieves MSE < 10“^. It is easily observable from these results that leveraging 
a non-i.i.d. support prior does indeed provide drastic performance improvements, as 
even the simple non-i.i.d. version of AMP-GB recovers significantly more digits than 
i.i.d. AMP-GB. We also see that by using the RBM model of the support, along with 
a TAP-based support probability estimation, we are able to improve upon this simple 
approach. For example, at a = 0.22, by using the RBM support prior in conjunction 
with TAP factorization we are able to recover an additional 56.0% of the test set than 
with no support information, and an additional 29% than when using only empirical per- 
coefficient support probabilities. The fact that both RBM-AMP approaches improve on 
non-i.i.d. AMP-GB demonstrates that the learned support correlations are genuinely 
providing useful information during the AMP CS reconstruction procedure. We also 
note that the 2"‘^-order TAP factorization provides reconstruction performance on this 
test set which is either equal to, or better than, the NMF factorization, at the cost of 
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Figure 4. (Left) Test set reconstruction performance for varying numbers of RBM 
hidden units. One can clearly see in increase in reconstruction performance as the 
RBM model complexity increases. (Right) Test set reconstruction performance at 
a = 0.2 as a function of the number of epochs, from 5 to 1000, used to learn a 500 
hidden unit RBM model. Note the sensitivity of the NMF RBM factorization to model 
over-fitting. 


an additional matrix-vector multiplication at each iteration of the RBM factorization. 
To demonstrate how these approaches compare with maximum achievable performance, 
we also show the support oracle performance, which corresponds to the percentage of 
test samples for which p < a. The proposed AMP-RBM approach, for the given RBM 
support model, closes the gap to oracle performance. 

In the right panel of Fig. we show the performance of the support estimation 
in terms of the Matthews Correlation Coefficient (MCC) which is calculated from the 
2x2 confusion matrix of estimated support and the true support. We observe from this 
chart that, even though measurement rate might be so low as to prevent an accurate 
reconstruction in terms of MSE, the estimated support of the recovered image may 
indeed be highly correlated with the true image. However, the values of the these 
on-support coefficients will not be correctly estimated in this regime. The estimated 
support may still be used for certain tasks in this case, such as classification. More 
visual evidence of this effect can be seen in Fig. were we see, even to the down to 
the extreme limit of a < 0.1, the support of the AMP-RBM recovered images are still 
quite correlated with the true image. For non-i.i.d. AMP-GB, the effect of the prior 
effectively forces the support to the central region of the image, and for i.i.d. AMP-GB, 
the support information is completely lost, having almost zero correlation with the true 
image. 

It is also of interest to analyze the effect of RBM training and model parameters 
on the performance of the proposed approach. Indeed, it is curious to note exactly 
how well trained the RBM must be in order to obtain the performance demonstrated 
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above. In the right panel of Fig. we see that a lightly trained, here on the order of 
40 epochs, RBM attains maximal performance, showing that an overwrought training 
procedure is not necessary in order to obtain significant performance improvements for 
CS reconstruction. Additionally, in the left panel of Fig. we see that increasing 
the complexity of the RBM model, and therefore more accurately estimating the joint 
support probability,increases reconstruction performance, showing that more complex 
RBMs, perhaps even stacked RBM models, may have the potential to further improve 
upon these results. 

Lastly, in terms of computational efficiency and scalability, the inclusion of the inner 
RBM factorization loop does increase the computational burden of the reconstruction 
in proportion to the number of hidden units and the number of factorization iterations 
required for convergence. However, these iterations are computationally light in 
comparison to the AMP FPI and so the use of the RBM support prior is not unduly 
burdensome. 

6. Conclusion 

In this work we show that using an RBM-based prior on the signal support, when learned 
properly, can provide CS reconstruction performance superior to that of both simple 
i.i.d. and empirical non-i.i.d. sparsity assumptions for a message passing approach such 
as AMP. The implications of such an approach are large as these results pave the way 
for the introduction of much more complex and deep-learned priors. Such priors can 
be applied to the signal support as we have done here, or further modifications can be 
made to adapt the AMP framework to the use of RBMs with real-valued visible layers. 
Such priors would even aid in moving past the M = K oracle support transition. 

Additionally, a number of interesting generalizations of our approach are possible. 
While the experiments we present here are only concerned with linear projections 
observed through an AWGN channel, much more general, non-linear, observation models 
can be used moving from AMP to GAMP m- Our approach can be then readily applied 
with essentially no modification to the algorithm. With the successful application of 
statistical physics tools to signal reconstruction, as was done in applying TAP to derive 
AMP, similar approaches could be adapted to produce even better learning algorithms 
for single and stacked RBMs. Perhaps such future works might allow for the estimation 
of the RBM model in parallel with signal reconstruction. 
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